The purpose of the report is to aggregate and examine selected techniques of imputation of missing data in the context of their impact on the prediction efficiency of classification algorithms. The following considerations include various imputation techniques, both basic (median / mode imputation) and more sophisticated (selected methods from the mice, VIM, missRanger and softImpute packages).
For testing purposes, as the classification algorithm, we used the ranger algorithm, which is a fast implementation of random forest, particularly suited for high dimensional data. The prediction effectiveness was assessed in relation to the AUC, balanced accuracy and Matthews correlation coefficient measures.
The report contains, all the results, grouped by both: package and dataset.
## Imputation time: 0.1
## Test set AUC: 0.916
## Test set BACC: 0.78
## Test set MCC: 0.602
## Imputation time: 0.007
## Test set AUC: 0.955
## Test set BACC: 0.888
## Test set MCC: 0.784
## Imputation time: 0.009
## Test set AUC: 0.56
## Test set BACC: 0.516
## Test set MCC: 0.036
## Imputation time: 0.009
## Test set AUC: 0.932
## Test set BACC: 0.866
## Test set MCC: 0.729
## Imputation time: 0.041
## Test set AUC: 0.996
## Test set BACC: 0.927
## Test set MCC: 0.898
## Imputation time: 0.123
## Test set AUC: 1
## Test set BACC: 0.986
## Test set MCC: 0.983
## Imputation time: 0.015
## Test set AUC: 0.933
## Test set BACC: 0.864
## Test set MCC: 0.774
## Imputation time:
## Test set AUC: 0.917
## Test set BACC: 0.78
## Test set MCC: 0.605
## Imputation time:
## Test set AUC: 0.959
## Test set BACC: 0.896
## Test set MCC: 0.796
## Imputation time:
## Test set AUC: 0.547
## Test set BACC: 0.508
## Test set MCC: 0.018
## Imputation time:
## Test set AUC: 0.927
## Test set BACC: 0.873
## Test set MCC: 0.741
## Imputation time:
## Test set AUC: 0.997
## Test set BACC: 0.917
## Test set MCC: 0.886
## Imputation time:
## Test set AUC: 1
## Test set BACC: 0.98
## Test set MCC: 0.976
## Imputation time:
## Test set AUC: 0.942
## Test set BACC: 0.864
## Test set MCC: 0.774
## Imputation time: 95.437
## Test set AUC: 0.916
## Test set BACC: 0.778
## Test set MCC: 0.601
## Imputation time: 0.33
## Test set AUC: 0.962
## Test set BACC: 0.895
## Test set MCC: 0.797
## Imputation time: 0.481
## Test set AUC: 0.582
## Test set BACC: 0.535
## Test set MCC: 0.076
## Imputation time: 0.158
## Test set AUC: 0.945
## Test set BACC: 0.879
## Test set MCC: 0.754
## Imputation time: 5.582
## Test set AUC: 0.996
## Test set BACC: 0.927
## Test set MCC: 0.898
## Imputation time: 237.384
## Test set AUC: 1
## Test set BACC: 0.98
## Test set MCC: 0.976
## Imputation time: 1.175
## Test set AUC: 0.942
## Test set BACC: 0.876
## Test set MCC: 0.793
## Imputation time: 0.087
## Test set AUC: 0.916
## Test set BACC: 0.779
## Test set MCC: 0.602
## Imputation time: 0.049
## Test set AUC: 0.963
## Test set BACC: 0.895
## Test set MCC: 0.797
## Imputation time: 0.054
## Test set AUC: 0.622
## Test set BACC: 0.588
## Test set MCC: 0.2
## Imputation time: 0.052
## Test set AUC: 0.905
## Test set BACC: 0.856
## Test set MCC: 0.707
## Imputation time: 0.081
## Test set AUC: 0.996
## Test set BACC: 0.907
## Test set MCC: 0.874
## Imputation time: 0.673
## Test set AUC: 1
## Test set BACC: 0.984
## Test set MCC: 0.98
## Imputation time: 0.116
## Test set AUC: 0.933
## Test set BACC: 0.857
## Test set MCC: 0.752
## Imputation time: 0.086
## Test set AUC: 0.916
## Test set BACC: 0.78
## Test set MCC: 0.602
## Imputation time: 0.012
## Test set AUC: 0.933
## Test set BACC: 0.859
## Test set MCC: 0.716
## Imputation time: 0.025
## Test set AUC: 0.551
## Test set BACC: 0.535
## Test set MCC: 0.076
## Imputation time: 0.011
## Test set AUC: 0.931
## Test set BACC: 0.859
## Test set MCC: 0.716
## Imputation time: 0.101
## Test set AUC: 0.997
## Test set BACC: 0.917
## Test set MCC: 0.886
## Imputation time: 1.203
## Test set AUC: 1
## Test set BACC: 0.978
## Test set MCC: 0.974
## Imputation time: 0.026
## Test set AUC: 0.94
## Test set BACC: 0.876
## Test set MCC: 0.793